Assessing the performance of classifiers when classes arise from a continuum
نویسنده
چکیده
The situation where classes arise by dividing the range of a continuous response variable into intervals is discussed. The focus is on assessing the performance of classi'ers. Due to the underlying continuum, all misclassi'cations are not equally grave. The probability of misclassi'cation (pmc) is not optimal in this situation. An alternative performance measure, the squared error rate (sqerr) is proposed. It is related to the mean squared error of regression, and penalises misclassi'cations according to their severity. Also, because of measurement errors in the response variable, there are misallocated class labels in data sets used for training and testing. Estimates of the pmc and the sqerr are developed for this situation. The estimates are tested and compared on a real data set and in a simulation. c © 2003 Elsevier B.V. All rights reserved.
منابع مشابه
Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملImproving reservoir rock classification in heterogeneous carbonates using boosting and bagging strategies: A case study of early Triassic carbonates of coastal Fars, south Iran
An accurate reservoir characterization is a crucial task for the development of quantitative geological models and reservoir simulation. In the present research work, a novel view is presented on the reservoir characterization using the advantages of thin section image analysis and intelligent classification algorithms. The proposed methodology comprises three main steps. First, four classes of...
متن کاملEvaluation of Classifiers in Software Fault-Proneness Prediction
Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...
متن کاملSupport Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran
Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 46 شماره
صفحات -
تاریخ انتشار 2004